# ULS: A Dual- $V_{th}$ /High- $\kappa$ Nano-CMOS Universal Level Shifter for System-Level Power Management

SARAJU P. MOHANTY University of North Texas and DHIRAJ K. PRADHAN University of Bristol

Power dissipation is a major bottleneck for emerging applications, such as implantable systems, digital cameras, and multimedia processors. Each of these applications is essentially designed as an Analog/Mixed-Signal System-on-a-Chip (AMS-SoC). These AMS-SoCs are typically operated from a single power-supply source which is a battery providing a constant supply voltage. In order to reduce power dissipation of the AMS-SoCs, multiple-supply voltage and/or variable-supply voltage is used as an attractive low-power design approach. In the multiple-/variable-supply voltage AMS-SoCs the use of a DC-to-DC voltage-level shifter is critical. The voltage-level shifter is an overhead when its own power dissipation is high. In this article a new DC-to-DC voltage-level shifter is introduced that performs level-up shifting, level-down shifting, and blocking of voltages and is called Universal Level Shifter (ULS). The ULS is a unique component that reduces dynamic power and leakage of the AMS-SoCs while facilitating their reconfigurability. The system-level architectures for three AMS-SoCs, such as Drug Delivery Nano-Electro-Mechanical-System (DDNEMS), Secure Digital Camera (SDC), and Net-centric Multimedia Processor (NMP) are introduced to demonstrate the use the ULS for system-level power management. The article presents a design flow and an algorithm for optimal design of the ULS using a dual- $V_{th}$  high- $\kappa$  technique for efficient realization of ULS. A prototype ULS is presented for 32nm nano-CMOS technology node. The robustness of the ULS design is examined by performing three types of analysis, such as parametric, load, and power. It is observed that the ULS produces a stable output for voltages as low as 0.35V and loads varying from 50 fF to 120 fF. The average power dissipation of the ULS with a 82 fF capacitive load is  $5\mu W$ .

This research is supported in part by NSF award numbers CCF-0702361 and CNS-0854182. Authors' addresses: S. P. Mohanty, Department of Computer Science and Engineering, University of North Texas, 1155 Union Circle no. 311277, Denton, TX 76203-5017; email: saraju. mohanty@unt.edu; D. K. Pradhan, Department of Computer Science, University of Bristol, UK; email: pradhan@compsci.bristol.ac.uk.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. © 2010 ACM 1550-4832/2010/06-ART8 \$10.00

DOI 10.1145/1773814.1773819 http://doi.acm.org/10.1145/1773814.1773819

Categories and Subject Descriptors: B.7.1 [Integrated Circuits]: Types and Design Styles—VLSI (very large scale integration); advanced technologies; C.5.4 [Computer Systems Organization]: Computer System Implementation—VLSI systems

General Terms: Design, Performance, Reliability, Security

Additional Key Words and Phrases: Power management, Analog/Mixed-Signal System-on-a-Chip (AMS-SoC), low-power design, nanoscale CMOS, system-level power management, DC-to-DC voltage-level shifter, dual-threshold voltage, high- $\kappa$ /metal-gate nano-CMOS

#### **ACM Reference Format:**

Mohanty, S. P. and Pradhan, D. K. 2010. ULS: A dual- $V_{th}$ /high- $\kappa$  nano-CMOS universal level shifter for system-level power management. ACM J. Emerg. Technol. Comput. Syst. 6, 2, Article 8 (June 2010), 26 pages.

DOI = 10.1145/1773814.1773819 http://doi.acm.org/ 10.1145/1773814.1773819

#### 1. INTRODUCTION

Real-life emerging applications including implantable systems, digital cameras, and multimedia processors are essentially designed as Analog/Mixed-Signal System-on-Chips (AMS-SoCs). Particular examples of such AMS-SoCs are Drug-Delivery Nano-Electro-Mechanical Systems (DDNEMS) [Mohanty et al. 2009a], Secure Digital Cameras (SDC) [Mohanty et al. 2007a, 2005], and Net-centric Multimedia Processors (NMP) [Mohanty et al. 2009b]. These AMS-SoCs are typically operated from a single power-supply source which is a battery providing a constant supply voltage. In order to be effective, these AMS-SoCs must have the following desired attributes: (1) low power dissipation, (2) fault tolerance, and (3) reconfigurability and field upgradability. This article discusses the system-level power management to address low-power dissipation aspects of the AMS-SoCs.

High-power dissipation is the primary bottleneck for the AMS-SoCs targeted for portable applications. It has many side-effects, such as reduction in battery lifetime, and increase in operating temperature of the system which will then require a heat transfer mechanism. In the DDNEMS, higher power dissipation has many side-effects, such as reduction in battery life which may lead to frequent operating by the doctors, and an increase in operating temperature of the system which will then require a heat transfer mechanism affecting the portion of the body where it is implanted. Power dissipation is also important issue for the design of the NMP. In particular, when the NMP would be integrated in portable devices like mobile phones, power dissipation becomes a paramount issue. In a mobile phone running mobile TV, the battery is the primary system constraint. Battery life is critical for the success of mobile TV. Similarly, power dissipation is an important constraint for SDC when deployed in critical applications like video surveillance in remote places.

There is need for devising integrated power management methods to reduce power consumption in these AMS-SoCs. When the AMS-SoCs are realized using nano-CMOS technology, the major components of total power dissipation are: gate-oxide leakage, subthreshold leakage, and dynamic power [Mohanty and Kougianos 2007; Kougianos and Mohanty 2009; Ghai et al. 2008]. These power dissipation sources depend on supply voltage, either linearly or

quadratically. Dynamic power management techniques with variable-supply voltage (variable- $V_{DD}$ ) are used for system-level power reduction and multiple-supply voltage (multi- $V_{DD}$ ) is a static solution for switching power reduction in Application-Specific Integrated Circuits (ASICs).

A typical portable system is realized as an AMS-SoC while supplied with power from a single battery source. This article discusses a special type of DC-to-DC level shifter, called ULS. The ULS suitable for power management and field programmability of such AMS-SoCs and can also be used as a standard cell in low-power design of ASICs. Efficient design of ULS is critical to reduce the overhead on the circuits that they are designed to serve. This article discusses using cutting-edge technology, high- $\kappa$ /metal-gate nano-CMOS for the design of the ULS. The high- $\kappa$  is used to contain the gate leakage which is assisted by the use of dual- $V_{th}$  technology to contain the subthreshold leakage. The use of high- $\kappa$  serves the dual purpose of scaling of the device as well as reducing of gate leakage. Hence high- $\kappa$ /metal-gate transistors serve as a good alternative to classical transistors at nano-CMOS technologies [Chau et. al. 2000; Choi et al. 2002; Ghai et al. 2009].

The salient features of this article are as follows.

- (1) Three representative reconfigurable applications are introduced, each of which can be realized as a multiple-supply voltage-based multicore Analog/Mixed-Signal System-on-a-Chip (AMS-SoC). The key components of these representative multiple-supply voltage-based AMS-SoCs are identified.
- (2) In order to serve the most pressing challenge, the power dissipation, the universal DC- to -DC voltage-level shifter (ULS) is introduced.
- (3) A novel design flow for energy-efficient design of a ULS circuit is proposed.
- (4) An algorithm is presented for the simultaneous power, leakage, and delay optimization of the ULS circuits.
- (5) A dual- $V_{th}$  technique is applied to the high- $\kappa$ /metal-gate the ULS circuit for its power and delay optimization.
- (6) A 32nm high-κ/metal-gate CMOS ULS is realized and thoroughly characterized for power dissipation, delay, and load.

The rest of this article is organized as follows: Section 2 introduces the architecture of representative systems along with the concept of ULS where low-power dissipation and programmability are required. Section 3 discusses the design of the ULS using high- $\kappa$ /metal-gate nano-CMOS technology. Section 4 presents the optimization algorithm used for efficient design of ULS. Section 5 discusses the functional simulation and characterization of the ULS. Section 7 presents conclusions and directions for future research.

## 2. REPRESENTATIVE EXAMPLES OF EMERGING SYSTEMS

In this section three representative emerging systems are introduced, each using the ULS for power management. Each system needs different supply voltages for operation of individual components while supplied constant voltage from a battery.



Fig. 1. The system architecture of a DDNEMS. The solid lines represent the power buses and the dotted line the data and control buses. The modules are digital, mixed-signal, RF, or nonelectrical. The battery provides a constant supply voltage of V, whereas the system needs different discrete voltage levels  $V_a, \ldots, V_j$ . The individual components, for example, the digital circuits, are intrinsically designed as a multiple-voltage-based circuit.

## 2.1 Drug-Delivery Nano-Electro-Mechanical Systems (DDNEMS)

Strong interest in improving the quality of human life catalyzes research in the area of self-health management. Typical conventional drug delivery schemes suffer many drawbacks that seriously limit their effectiveness for self-health management. NEMS are a technological solution for building miniature systems which can be beneficial in terms of safety, efficacy, or convenience [Wolbring; Staples et al. 2006]. The goal of NEMS-based drug delivery is to administer drugs in predetermined targets and doses using implantable chips which are controlled or programmed externally through a radio frequency interface. Figure 1 presents an architecture for the DDNEMS, the typical components of which are now discussed [Mohanty et al. 2009a].

The Power Management Unit (PMU) is one of the most important components of the entire DDNEMS. It manages the power distribution to the various subsystems to reduce energy consumption using the control signals from the Digital Signal Processor (DSP) and the stored microcode. It has built-in timers that put the system to "sleep" or "wake-up" mode and can be induced to activate the system via external signals received by the RF subsystem (to force an emergency drug delivery, for example). The heart of the PMU is a ULS bank. A ULS sends different operating voltages to various subsystems of DDNEMS, each operating at different voltages from a single battery and facilitating reconfigurability.

The key component of the DDNEMS is the drug delivery subsystem, which is typically nonelectrical in nature. To allow for redundancy, fault tolerance, load sharing, and multiple drugs, the subsystem itself needs to be designed as an array. The array is expected to be heterogeneous, that is, the elements of the array are quite diverse. The different array elements in the DDNEMS include micropumps, microfluidic devices, stents, and microneedles. The array elements have appropriate transducers to facilitate their control and interfacing to the electrical portions of the DDNEMS.

The data processing, controlling, and interfacing functions of the DDNEMS are handled by electrical subsystems, which are analog, digital, or mixed-signal circuits. The monitoring and control of the drug array is performed by the sensor subsystem which communicates through the transducers. Its front-end (transducer side) is analog but at the back-end, interfacing to the DSP is digital. The DSP subsystem analyzes the online data generated by the sensors and, using the program stored in the flash memory subsystem, generates control signals for drug delivery, facilitating fault tolerance, load sharing, and drug mixing. The system monitoring subsystem continually polls the various electrical subsystems and transducers to obtain a snapshot of the DDNEMS's functionalities. It alerts the DSP to initiate appropriate actions upon the discovery of faults or errors.

The RF subsystem which is comprised of an antenna and transmitter/receiver is built using RFID principles for the shape and placement of the antenna and communication protocol. Its function is to facilitate noninvasive maintenance of the system (e.g., modification of the microcode stored in the flash memory), remote collection of data (e.g., amount of drug remaining in the reservoir, drug array element failures, or battery status), and emergency drug delivery or system deactivation.

## 2.2 Secure Digital Camera (SDC)

Digital media transmitted or displayed through digital TV broadcast, Compact Disc (CD), Digital-Video Disc (DVD), personal computers, smart phones, Personal Digital Assistants (PDAs) offers several distinct advantages over analog media, including high visual quality and easy processing. The ease by which a digital media is tampered with gives rise to the need for Digital Rights Management (DRM) [Memon and Wong 1998; Eskicioglu and Delp 2001; Cox and Miller 2002]. Digital watermarking is used along with encryption to provide dual-layer copyright protection through DRM [Eskicioglu and Delp 2001; Macq and Quisquater 1995]. Watermarking embeds extra information called a watermark into a multimedia (e.g., image, audio, video) such that the watermark can later be used to make an assertion about the host. Many software-based systems of the DRM algorithms are available, but very few attempts are made for hardware-based DRM. Hardware-based DRM is absolutely necessary for low power, real-time performance, high reliability, low-cost applications, and also for easy integrability with existing consumer-electronic applications [Mathai et al. 2003a, 2003b; Kougianos et al. 2009]. For example, DRM chips can be integrated with any digital camera [Mohanty et al. 2004; 2007a; Adamo 2006; Adamo et al. 2006a, 2006b]. The hardware modules can also be integrated with a JPEG codec [Mohanty et al. 2003], which can be a part of a scanner, a digital camera, or any multimedia device so that the multimedia are secured right during capture-time at the source. The high-level system architecture of the Secure Digital Camera (SDC) and its main components are shown in Figure 2 [Mohanty 2005, 2007a; Adamo et al. 2006a, 2006b].

In the SDC, the image is captured by an image sensor (a.k.a. Active Pixel Sensor, APS) and converted to a digital signal by the Analog-to-Digital



Fig. 2. System architecture of the Secure Digital Camera (SDC). The individual units are designed to operate at discrete supply voltages  $V_a$ ,  $V_b$ , ...  $V_n$ . The digital units are designed to operate at two discrete supply voltages. The smart controller provides the different supply voltages using the ULS bank. The solid lines represent data lines and dashed lines represent control lines.

Converter (ADC). A CMOS image sensor that has an embedded ADC can also be used (a.k.a. Digital Pixel Sensor, DPS). The captured image is stored temporarily in the scratch memory, after which it is displayed on the LCD panel using the controller. The purpose of the LCD panel is to enable the user to see the image before it is processed by the watermarking or encryption units and stored in the camera, which can then be further transmitted over the network, or transferred to flash memory, computer hard drive, or optical discs. The controller is responsible for coordinating the entire sequence of events. Both the invisible-robust and visible watermarking algorithms are used along with encryption and data compression (which is an image compression unit such as JPEG). The choice of the operations performed on the image is dependent on the user of the camera. The security of the image in the SDC is dependent on the encryption unit, for example, based on the Advanced Encryption Standards (AES) algorithm.

One of the specific applications of the SDC is electronic passport [Mohanty et al. 2007a; Adamo et al. 2006a, 2006b; Adamo 2006]. The SDC can invisibly watermark biometric information, such as "iris image", "handwritten signature", or "fingerprint" into an individual's image, which can then be added to the passport. The watermarking is key based and this key is encrypted and then embedded as a visible watermark in the form of a barcode on the picture image. The robustness of the invisible watermark and the authenticity of the picture image are based on the secret key. The biometric data cannot be accessed and extracted unless the secret key is known. At the same time, the secret key for the invisible watermarking process cannot be known unless it is decrypted. Hence, SDC offers double protection to the biometric data embedded into the picture image. The SDC also ensures the privacy issues pertaining to the owners of the biometric data.

Several attempts have been made for realization of different components of SDC. The trustworthy camera for restoring credibility to photographic images using encryption is presented in Friedman [1993]. This camera produces two output files representing the captured image and the "digital signature" of the captured image. A Biometric Authentication System (BAS) in the



Fig. 3. High-level representation of the architecture of the NMP. PE scheduler and voltage schedulers work in coordination for reconfiguration and power management. The individual units are designed to operate at discrete supply voltages  $V_a, V_b, \ldots V_i$ .

framework of an SDC is presented in Blythe and Fridrich [2004]; however, hardware architectures are not proposed. Design for a CMOS Active Pixel Sensor (APS) with pseudorandom number generation capability which is needed for watermarking is presented in Nelson et al. [2005]. Industries have produced cameras with watermarking capabilities; however, these cameras were discontinued for unknown reasons, for example, Epson released the PhotoPC 3000Z and 800Z model and Kodak manufactured the DC-200 and DC-260 [Blythe and Fridrich 2004].

#### 2.3 Net-Centric Multimedia Processor (NMP)

Information in the form of video is preferred over other forms of multimedia for combined audio-visual effects which is well supported by the significant growth of the Internet and high-bandwidth communications [Emmanuel and Kankanhalli 2003; Cherry 2005]. Video is the hardest multimedia information to deal with because it has extensive memory and computational requirements, as it is a three-dimensional signal. Video is made available and transmitted by using many video compression standards, such as MPEG-4 [Bhargava et al. 2004; Richardson 2003; Sikora 1997], H.264 [Richardson 2003], and VC-1. Thus, there is a need for a system for integrated video compression, encryption, and watermarking which will work well with these video coding standards. The Net-centric Multimedia Processor (NMP) is such a system [Mohanty et al. 2009b; Tarigopula 2008]. The architecture of the NMP is shown in Figure 3 [Mohanty et al. 2009b; Tarigopula 2008]. NMP has built-in facilities for realtime multimedia information security or DRM. An NMP can be integrated in any multimedia processing networked equipment (e.g., mobile phones or sensor networks) to facilitate Internet Protocol (IP) packet processing and multimedia information processing without the use of a main Central Processing Unit (CPU). NMP will be very useful for several critical applications, like video surveillance, video over IP, and IP-TV [Cherry 2005; Jain 2005; Alfonsi 2005].

The system of NMP consists of several Processing Elements (PEs), each PE with dedicated functionalities and all PEs connected through an internal bus. This bus forms the physical communication channel among the PEs as well as other components of the NMP. Packet classification is an intensive task which is carried out by the packet classifier in NMP. The packet classifier reads

the header of an incoming packet, determines the stream to which the packet belongs, selects the outgoing interface, and passes the packet to the appropriate PE for further processing. The outgoing packet is dynamically buffered by the packet scheduler until it is sent to the outgoing link. The instruction and control memory is used to store the instructions corresponding to the functions that will be executed using the NMP. The data memory is used to buffer the data, and an appropriate mechanism is needed to avoid data conflict among the PEs. Input and output interfaces are ports through which the NMP will communicate with other systems or the CPU.

Real-time packet classification is needed for the NMP. The design of the packet classifier exploits the structure and characteristics of packet classification rules [Kounavis et al. 2003; Nourani and Faezipour 2006]. The packet scheduler is needed to control different traffic streams and to determine the streams' quality [Zhang et al. 2000]. Wide ranges of scheduling algorithms whose hardware implementation is needed for the NMP are described in Xu and Lipton [2002]. Each PE in the NMP needs can be designed to operate at a finite set of supply voltages in the range of  $V_1$  to  $V_m$ , where m is a natural number, and  $V_m$  is the maximum supply voltage, for low-power dissipation [Mohanty et al. 2006]. The PE scheduler activates and deactivates each PE, depending on the application of NMP. The inactive PEs will be shut off with a switching mechanism to reduce leakage power [Hu et al. 2004]. The ULS is specifically useful for reducing dynamic power as well as standby leakage. The voltage scheduler dynamically assigns the operating voltage of each PE depending on the traffic load and application requirements so that power and delay specifications are met. These units together form the set of units to provide a real-time DRM facility in the NMP. The sequence in which they will be used depends on the application and location of the NMP in the IP network cloud. The compression unit performs one of the video compression standards such as H.264, MPEG-4, or VC-1.

#### 2.4 Use of the ULS for Reconfiguration and Power Management

In the multi- $V_{DD}$  AMS-SoC design, once individual units and processing elements are designed, the next issue is integrating them. The ULS is used for such integration in static or dynamic fashion. The high-level representation of the ULS is shown in Figure 4 [Mohanty et al. 2007b, 2009a; Ghai et al. 2008; Vadlamudi 2007]. It has an input voltage signal called  $V_{in}$ , two control signals S1 and S0, two supply voltages  $V_{DDh}$  and  $V_{DDl}$ , and an output voltage signal  $V_{out}$ . The control signals decide which functionality is to be performed by the ULS. Depending on the control signal, the input voltage  $V_{in}$  is transformed to the output voltage  $V_{out}$ . Table I presents the truth table which defines the functionality of the ULS and can be used for programming the ULS.

The ULS is capable of performing four types of operations on the voltage signal: (1) level-up shifting, (2) level-down shifting, (3) signal-passing (no shifting), and (3) signal-blocking as needed for power management in AMS-SoCs. Voltage-level up-shifting is stated as shifting of a low-voltage signal to a high-voltage level, while in contrast, voltage-level down-shifting is defined as



Fig. 4. High-level representation of the Universal Level Shifter (ULS).

Table I. Control Signals for Programmability or Reconfiguration

| Select Signal (S1, S0) |   | Functionality       |  |
|------------------------|---|---------------------|--|
| 0                      | 0 | Signal-Blocking     |  |
| 0                      | 1 | Level-Down Shifting |  |
| 1                      | 0 | Level-Up Shifting   |  |
| 1                      | 1 | Signal-Passing      |  |

shifting of a high-voltage signal into a low voltage. Passing of the signal indicates bypassing the signal to the other side of the network without doing any operation on the signal. Blocking indicates completely stopping the input signal from appearing at the other side. The ULS is programmed for any of these four functionalities depending on the type of requirement.

The type of functionality to be performed is selected using the two control signals. Level-down shifting is used to provide supply to the blocks of the subsystems which operate at lower than battery voltage. Level-up shifting is applied as an interface where lower-supply voltage cells are driving highersupply voltage cells or to provide supply to subsystems operating at higher than the battery voltage. The blocking feature of the ULS is used to shut off the unused blocks of a circuit in the standby mode, thereby reducing standby leakage. The ULS is programmed according to different requirements, however, all the supporting operations may not be needed every time. A combination of two operations, for example, block and step-down, is needed for dynamic power management. For static power management one operation is performed at a time where ULS is used as a single standard cell, in which case the pass-signal operation is not needed. AMS-SoCs may use level-up shifting with blocking features to reduce short-circuit power and leakage power. AMS-SoCs may also use level-down shifting with the blocking features to minimize switching power, in addition to standby leakage.

Figure 5 illustrates the logical configuration using two PEs while they are operating at two different supply voltages. The explanation is logical representation of the use of ULS for multiple-supply voltage AMS-SoCs. The ULS is used for two different locations: one at the power supply and other interfacing different voltage operating islands. The actual scenario may be different for (semi)-custom design and Field-Programmable-Gate-Array (FPGA)-based design. The switch in real-life AMS-SoCs may be firmware or just control signals. The working principle of configurable architecture shown in Figure 5 can be analyzed as follows. There are two different processing elements PE<sub>1</sub> and PE<sub>2</sub>.





(b) case(b):  $PE_1$  operating at V driving  $PE_2$  operating at  $V^-$ .

Fig. 5. System configuration for two supply voltage scenarios. There are several possible cases for configurability as explained in the text. Two cases are represented, case(a) and case(b). In case(a),  $PE_1$  operating at Voltage  $V^-$  driving  $PE_2$  operating at V. In case(b),  $PE_1$  operating at V driving PE<sub>2</sub> operating at V<sup>-</sup>. Similar configurations can be shown for all other cases of configurability. The pass-signal unit would transfer the signal without changing the voltage level, and the blocksignal unit will completely disconnect the signal from the PE, which helps in reducing static power consumption. The solid arrow indicates signal flow and dashed arrow indicates nonflow.

Each PE can be operated at any of the two different discrete voltages, V and  $V^-$ , where V is the supply voltage. In this scenario the following four configuration modes possible are as follows: (a)  $PE_1$  operating at voltage  $V^-$  driving  $PE_2$ operating at V; (b) PE<sub>2</sub> operating at voltage  $V^-$  driving PE<sub>1</sub> operating at V; (c) PE<sub>1</sub> operating at voltage V driving PE<sub>2</sub> operating at V<sup>-</sup>; and (d) PE<sub>2</sub> operating at voltage V driving  $PE_1$  operating at  $V^-$ . In particular, for example, case (a) in Figure 5(a), supply voltage to PE<sub>1</sub> comes through supply voltage level by step-down level shifting. Since PE<sub>1</sub> is operating at  $V^-$  and PE<sub>2</sub> is operating at V, step-up level shifting is needed between the two. Similarly, the case (b) in Figure 5(b) can be analyzed in which step-down shifting provides power supply to PE2 and step-up level shifting connects between PE2 and PE1. Similarly, configurations for case (c) and case (d) can be discussed in which step-down

level shifting would be necessary from the ULS. When both  $PE_1$  and  $PE_2$  are operated at the same voltage, either both at  $V^-$  or both at V, the ULS does not need to perform level-shifting. The block signal unit of the ULS will be used to disconnect  $PE_1$  and  $PE_2$  from each other.

#### 3. DESIGN OF ULS USING HIGH-K/METAL-GATE NANO-CMOS

This section discusses flow for ULS design using high-κ/metal-gate nano-CMOS technology [Mohanty et al. 2007b, 2009a; Ghai et al. 2008; Vadlamudi 2007].

#### 3.1 Models for Power and Delay Calculation of the ULS

3.1.1 *Power and Leakage Models*. The total power of a nano-CMOS circuit is calculated as the summation of major components, like dynamic power, subthreshold leakage, and gate leakage. The use of high- $\kappa$ /metal-gate nano-CMOS transistors as technology for our design eliminates gate leakage. Thus, the power dissipation of the ULS circuit is calculated by the following expression.

$$P_{ULS} = P_{dynamic} + P_{subthreshold} \tag{1}$$

The dynamic power dissipation of the circuit, which depends on loading conditions, is calculated as follows [Rabaey et al. 2003; Mohanty et al. 2008]. We have

$$P_{dynamic} = \alpha \times C_L \times V_{DD}^2 \times f, \tag{2}$$

where the  $\alpha$  term is the activity factor,  $C_L$  is the total switched capacitive load,  $V_{DD}$  is the supply voltage, and f is the clock frequency. This term is derived from the equations for energy consumed in charging and discharging a capacitor. This power dissipation depends on loading conditions and not the device features.

The subthreshold leakage of a nano-CMOS device is calculated by the following expression [Sill et al. 2007].

$$I_{sub} = \mu_0 \left( rac{\epsilon_{gate} W_{eff}}{T_{gate} L_{eff}} 
ight) imes v_{therm}^2 e^{1.8} imes \exp \left( rac{V_{gs} - V_{th}}{S imes v_{therm}} 
ight) imes \left( 1 - \exp \left( rac{-V_{ds}}{v_{therm}} 
ight) 
ight)$$

 $\mu_0$  is the zero bias mobility,  $\epsilon_{gate}$  dielectric constant of the gate dielectric,  $L_{eff}$  is the effective channel length,  $V_{th}$  is the threshold voltage,  $v_{therm}$  is the thermal voltage, S is the subthreshold swing factor,  $V_{gs}$  is gate-to-source voltage, and  $V_{ds}$  is the drain-to-source voltage. From the preceding expression it is clear that if  $T_{gate}$  is increased, the length  $(L_{eff})$  is increased, and/or the width  $(W_{eff})$  of the transistors is reduced, there will be a reduction in the subthreshold current. This leakage current is exponentially dependent on  $V_{th}$ , and increasing  $V_{th}$  will decrease the leakage current substantially.

3.1.2 *Delay Model*. The delay of a CMOS circuit is approximately calculated using the follow expression [Sill et al. 2007]. We have

$$D = \gamma \times \left( \frac{C_L \times V_{DD}}{\mu \times \left( \frac{\epsilon_{gate}}{T_{gate}} \right) \times \left( \frac{W_{eff}}{L_{eff}} \right) \times (V_{DD} - V_{th})^{\alpha}} \right), \tag{3}$$

where  $\gamma$  is a technology-dependant constant,  $\mu$  is the electron surface mobility, and  $\alpha$  is the velocity saturation index, which varies from 1.4 to 2 for nano-CMOS,  $\epsilon_{gate}$  dielectric constant of the gate oxide,  $L_{eff}$  is the effective channel length, and  $W_{eff}$  is the effective width of the transistors. Since in a ULS both level-up shifting and level-down shifting takes place, the average propagation delay of the ULS  $(D_{ULS})$  is defined as

$$D_{ULS} = \left(\frac{D_{up} + D_{down}}{2}\right),\tag{4}$$

assuming an equal number of level-up shifting and level-down shifting operations.  $D_{up}$  and  $D_{down}$  are the level-up shifting and level-down shifting delays, respectively. The delay of the ULS circuit is calculated from the 50% level of the input swing to 50% level of the output swing.

### 3.2 High-κ Nano-CMOS Modeling

For the design and simulation of the ULS, using a high-κ/metal-gate selection of the appropriate transistor model is critical. It is difficult to get access to industrial standard models at this point of time. In the absence of such models, the Predictive Technology Model (PTM) is used for design and simulation of the ULS [Zhao and Cao 2006]. The PTM is well established and is able to predict the general trend of device attributes and captures the physics of the devices accurately. In the absence of published data and other device models, PTM provides a timely and effective analysis approach. The simulation results obtained are highly accurate and the calculated data are of comparable accuracy to Technology-Computer-Aided Design (technology CAD or TCAD) simulations which are typically time and computation intensive. For PTM-based BSIM4 models, either of the two methods are used [Mukherjee et al. 2005]: (1) The parameter (EPSROX) in the model card that denotes relative permittivity is changed and (2) the Equivalent Oxide Thickness (EOT) for the dielectric under consideration is calculated. The EOT is calculated so as to keep the ratio of relative permittivity over dielectric thickness constant using the expression

$$T_{ox}^* = \left(\frac{\epsilon_{SiO_2}}{\epsilon_{oate}}\right) \times T_{gate},$$
 (5)

where  $\epsilon_{gate}$  is the relative permittivity and  $T_{gate}$  is the thickness of the gate dielectric material other than SiO<sub>2</sub>, while  $\epsilon_{\text{SiO}_2}$  is the dielectric constant of SiO<sub>2</sub>(= 3.9). In this article,  $\epsilon_{gate}$  is taken as 21 to emulate an HfO<sub>2</sub>-based dielectric. The EOT is calculated to be 5nm for 32nm node.

#### 3.3 The Design Flow Using Dual- $V_{th}$ -Based High- $\kappa$ Nano-CMOS Technology

Algorithm 1 presents a design flow or optimal design of ULS using dual- $V_{th}$ -based high- $\kappa$ /metal-gate technology. It may be noted that in SiO<sub>2</sub>-based nano-CMOS technologies (particularly for sub-65nm node), gate-oxide leakage is a major contributor to power during ON, OFF, and transient states of a circuit [Mohanty and Kougianos 2007]. This is overcome using the dual-oxide technique, as proposed in Ghai et al. [2008]. This is a viable solution above the 45nm

#### Algorithm 1 Power-Delay Optimal ULS Design Methodology

- 1: Design and simulate level-up shifting sub-circuit of the ULS.
- 2: Design and simulate level-down shifting sub-circuit of the ULS.
- 3: Design and simulate pass/block sub-circuit of the ULS.
- 4: Stitch the partial circuit circuits to design the complete ULS circuit.
- 5: Eliminate gate leakage power by using high- $\kappa$ /metal-gate nano-CMOS technology.
- 6: Perform functional simulation of the ULS to test different functionality and programmability using different input control signals.
- 7: Perform reduce transistor design of the ULS by eliminating any redundancy in the circuit and perform functional simulations of the new circuit.
- 8: Obtain netlist of the ULS and parameterize the netlist for transistor width.
- 9: Rank the individual transistors of the ULS circuit in the order of total power dissipation accounting the subthreshold leakage.
- 10: Identify the power-hungry transistors which collectively dissipate the designerdefined percentage of total power.
- 11: Call the conjugate gradient algorithm to select optimal width for all the transistors.
- 12: Assign high- $V_{th}$  to the power-hungry transistors to reduce the subthreshold leakage power dissipation.
- 13: Assign the new width to all the transistors of the ULS circuit.
- 14: Perform the parametric, power, and load characterization of the final ULS circuit.
- 15: Perform the process variation analysis to study robustness of final ULS circuit.

CMOS technology node. However, at sub-45nm technologies (e.g., 32nm in this article), this technique is not viable, and hence bulk-CMOS must be replaced by high- $\kappa$ /metal-gate CMOS. This motivates the use of the high- $\kappa$ /metal-gate nano-CMOS for the design of the ULS to eliminate gate-oxide leakage.

One prominent component of the total power dissipation in the ULS circuit is the subthreshold leakage. A dual- $V_{th}$  is adopted to reduce subthreshold leakage [Wei et al. 1999]. A higher  $V_{th}$  in a transistor leads to lower subthreshold current, but increases the propagation delay. Hence a dual- $V_{th}$  technique is presented for the minimization of the subthreshold leakage in the ULS circuit. The power-hungry transistors are assigned a higher- $V_{th}$  value in this technique leaving the other transistor at nominal- $V_{th}$ . It may be noted that while the dual- $V_{th}$  technique is well proven in digital circuits, its use in analog circuits like ULS is distinct in this design.

The total power dissipation accounting the subthreshold leakage and delay of the entire ULS circuit are optimized using the optimization methodology, an algorithm for which is presented in Section 4. Hence, as the end result of this design flow, a thorough optimization of the ULS circuit is obtained for use in a multi- $V_{DD}$  circuits and systems environment.

#### 3.4 Circuit-Level Design of the ULS

For level-up shifting, a Cross-Coupled Level Converter (CCLC) shown in Figure 6 is used. In this subcircuit, there are two cross-coupled PMOS transistors to form the circuit load. The cross-coupled PMOS transistors act as a differential pair [Ishihara and Sheikh 2004]. Thus, when the output at one side is pulled low, the opposite PMOS transistor will be turned on and the output on that side will be pulled high. Below the PMOS load, there are two NMOS transistors that are controlled by the input signal  $V_{in}$ . The CCLC is an



Fig. 6. Level-up shifting subcircuit showing baseline sizes for 32nm.



Fig. 7. Level-down shifting subcircuit showing baseline sizes for 32nm.

asynchronous level shifter. In other words, it can be inserted anywhere in the circuit wherever voltage-level shifting is necessary. Because of this flexibility, CCLC is one of the most commonly used designs to suppress the DC current [Ishihara and Sheikh 2004]. This is most suitable to be used as a standard cell for multi- $V_{DD}$ -based circuit design [Mohanty et al. 2006].

A differential input level shifter subcircuit as shown in Figure 7 is used for voltage-level down shifting. The circuit consists of a cross-coupled PMOS pair. It is similar to the voltage-level up shifting circuit. It has a differential input which enables a stable operation at low-voltage and high-speed use [Kanno et al. 2000]. The differential input also offers immunity against power supply bouncing [Sanchez et al. 1999] to ensure a supply of constant voltages even in tougher conditions.

The blocking circuit completely stops any voltage signal at the input side from appearing at the output side. This feature is crucial in cases when total isolation from the input-voltage signal is required for reduction of standby leakage power. The blocking circuit is designed by using a tristate-buffer circuit which makes use of a transmission gate [Mohanty et al. 2007b; Vadlamudi 2007]. The tristate buffer circuits acts as a high-impedance circuit when it is in "not enabled" mode. The state of high impedance is defined as the state of the output circuit which is not driven by the circuit. The function of the passing circuit is to bypass the input signal as it is to the other side of the circuit. In other words, it acts as a buffer between the input and output. The passing



Fig. 8. Transistor-level circuit of the baseline ULS with 32 transistors.

circuit is designed with the use of a transmission gate [Mohanty et al. 2007b; Vadlamudi 2007].

Figure 8 shows a transistor-level circuit design of the ULS. This is achieved by stitching the individual subcircuits which perform step-up shifting, stepdown shifting, and pass/blocking functionalities. To achieve programmability, multiplexers are used where-ever necessary. For circuit optimization, instead of using a 4:1 multiplexer or three 2:1 multiplexers, the functionalities are achieved by using two 2:1 multiplexers. They are controlled by the control signals S1 and S0. In the baseline circuit design transistor sizes, such as, W =320nm, L = 32nm for NMOS devices, and W = 640nm, L = 32nm for the PMOS devices, are chosen, respectively, to achieve correct functionality of the ULS. By eliminating the redundant transistors a reduced transistor ULS circuit is constructed which is shown in Figure 9. A further reduced transistor ULC circuit design is shown in Figure 10. In this design, a switch constructed using transmission gates is attached in front of the level-up shifting circuit and level-down shifting circuit. The output of the ULS is controlled by the switches. The number of transistors was reduced to 24, eliminating 8 transistors from the baseline design. The 24-transistor ULS (Figure 9) has two output nodes instead of one as in the case of the 28-transistor design (Figure 10). The choice of their use depends on the application. The single-output 28-transistor ULS has more flexible programmability, but has more area and power dissipation and is more suitable for FPGA environments. On the other hand, the two-output 24-transistor has less flexible programmability, but has lesser area and power dissipation, and is more suitable for Application-Specific Integrated Circuits (ASICs).

Each of the preceding subcircuits of the ULS as well as the three variants of the ULS presented are thoroughly tested and characterized through parametric, load, and power analysis. For power-delay optimization point-view, each of the preceding variants of ULS circuit (i.e., Figure 8, Figure 9, and Figure 10)



Fig. 9. Transistor-level circuit of the ULS with 28 transistors.



Fig. 10. Transistor-level circuit of the optimal design of the ULS with 24 transistors. The circled transistors are identified as power-hungry and subjected to the dual- $V_{th}$  technique.

can be subjected to optimization using the algorithm presented in the following section.

## 4. DTCMOS-BASED OPTIMIZATION IN HIGH- $\kappa$ NANO-CMOS ULS

The dual- $V_{th}$  technique [Mohanty and Kougianos 2007; Wei et al. 1999] is used along with transistor sizing to achieve a power-delay optimized ULS. In the optimization algorithm, power consumption dissipation of ULS is the target objective function and a propagation delay of ULS is the constraint. Any one of the three variants of ULS circuit can be subjected to optimization; however, for brevity the rest of the discussions in this article are for the third circuit alternative with 24 transistors in Figure 10.

First, the power-hungry transistors of the ULS circuit are identified and are assigned higher  $V_{th}$  values. Power-hungry NMOS are assigned 20%

higher  $V_{th}$  and power-hungry PMOS are assigned 50% higher  $V_{th}$  as compared to the nominal values specified for the technology node [Mohanty et al. 2010]. These transistors are marked as dashed-circles in Figure 10. This reduces the power consumption considerably, but increases the delay (Eq. (3)). Hence the transistor geometry is also explored, where the widths of all the transistors in the level-up and level-down shifting subcircuits were considered. In general, sizing of parameters, such as, L, W, and finding appropriate value of  $V_{th}$  can be considered during optimization [Ghai et al. 2008, 2009a]. However, for simplicity, sizing on W will be presented in this article keeping other parameters at a technology-defined nominal value for L and experimentally selected value of  $V_{th}$ .

Algorithm 2 is used for the power dissipation (accounting leakage) and delay optimization of the ULS circuit. The algorithm is based on conjugate-gradient method [Hager and Zhanag 2006; Ghai et al. 2009]. The conjugate-gradient method is an algorithm for the numerical solution of systems of linear equations whose matrix is symmetric and positive-definite. The main advantages of the conjugate gradient method are its low memory requirements and its convergence speed. This is based upon the feasible sequential quadratic programming. This is advantageous for analog circuits like ULS with a complex netlist to be optimized.

The inputs to the proposed algorithm are comprised of the circuit netlist, the objective set  $\hat{F}$  ( $P_{ULS}, D_{ULS}$ ) with its stopping criteria S (e.g., 1–5%), and the design variable set  $\hat{D}$  with its lower constraint  $C_{lower}$  and upper design constraint  $C_{upper}$ . The lower design constraint  $C_{lower}$  is  $(\hat{D} - \Delta \hat{D})$ , that is,  $(W - \Delta W)$ . The upper design constraint  $C_{upper}$  is  $(\hat{D} + \Delta \hat{D})$ , that is,  $(W + \Delta W)$ .

The design variable set  $\hat{D}$  in this article is comprised of the following: (1)  $W_{PMOSup}$ : width of PMOS transistors in level-up shifting subcircuit; (2)  $W_{NMOSup}$ : width of NMOS transistors in level-up shifting subcircuit; (3)  $W_{PMOSdown}$ : width of PMOS transistors in level-down shifting subcircuit; and (4)  $W_{NMOSdown}$ : width of NMOS transistors in level-down shifting subcircuit. The outputs of the algorithm are the optimized objective set  $\hat{F}_{optimal}$  satisfying the stopping criteria S and the optimal values of the design variable set  $\hat{D}_{optimal}$  within  $C_{lower}$  and  $C_{upper}$ .

During optimization, a simulation is performed using the initial values of  $\hat{D}$  and the values of  $\hat{F}$  are calculated to determine whether the initial values are feasible for the given  $\hat{F}_{optimal}$ . In the next iteration, the design variable set  $(\hat{D})$  values are changed accordingly to traverse towards the required  $\hat{F}_{optimal}$ . This is called finite difference perturbation. The ULS circuit is simulated again using this new design variable set. This process continues untill  $\hat{F}_{optimal}$  meets with stopping criteria S. The optimized objective set  $\hat{F}_{optimal}$  is presented in Table II, and the  $\hat{D}_{optimal}$  values are presented in Table III.

#### 5. CHARACTERIZATION OF THE ULS CIRCUIT

This section discusses the functional simulation and characterization of the ULS circuit. The ULS circuit is characterized using three types of analysis: parametric, load, and power analysis to check the robustness of the design.

## Algorithm 2 The Power-Delay Optimization Algorithm for ULS Circuit

```
1: Input: Circuit netlist; Objective set \hat{F} = [f_1, f_2....f_n], i.e. [P_{ULS}, D_{ULS}]; Stopping
                   criteria S, design variable set \hat{D} = [d_1, d_2....d_n], i.e. [W_{PMOSup}, W_{NMOSup}, W_{PMOSdown}, W_{PMO
                   W_{NMOSdown}]; Lower design constraints on \hat{D} C_{lower} and Upper design constraints on
    2: Output: \hat{F}_{optimal}, \hat{D}_{optimal} for S=\pm\sigma, and optimal ULS circuit. {Where 1\% \leq \sigma \leq \sigma
                   5% is designer defined error margin.
     3: Perform the initial simulation in order to obtain feasible values of design variables
                   for the given objective set.
    4: while (C_{lower} < \hat{D} < C_{upper}) do
5: Use finite difference perturbation to generate new set of design
                                                   variables \hat{D}' = \hat{D} + \delta \hat{D} for design space exploration.
                                                   Compute the new objective set \hat{F}(\hat{D}') = [P_{ULS}, D_{ULS}].
    6:
                            if (S == \pm \sigma) then
    7:
                                                                    i.e. Stopping criteria is in the error margin.
    8:
                                                                    return \hat{D}_{optimal} = \hat{D}', where \hat{D}' = [W_{PMOSdown}, W_{PMOSup}, W_{NMOSdown}, W_{PMOSup}, W_{PMOSup}
    9:
                                                                     W_{NMOSup}].
10:
                             end if
11: end while
12: Obtain optimal values for design variable set \hat{D}_{optimal}.
13: Redesign the ULS circuit with new variables.
14: Compute optimized Objective set \hat{F}_{optimal} for the ULS.
```

Table II. Optimized Values of Objective Set  $\hat{F}_{optimal}$ 

| Objective | Value    |
|-----------|----------|
| $P_{ULS}$ | $5\mu W$ |
| $D_{ULS}$ | 1.6ns    |

The functional simulation is the same for all three alternative ULS circuits, whereas characterization results are different. For brevity, the characterization of 24-transistor ULS of Figure 10 is discussed in this section.

#### 5.1 ULS Functional Simulation

Before constructing the overall ULS circuits, the functional simulations of each subcircuit responsible for level-up shifting, level-down shifting, and pass/block were performed [Mohanty et al. 2007b, 2009a; Vadlamudi 2007]. The functional simulation of the ULS is shown in Figure 11. When the control signals S1 and S0 are "00", the input signal  $V_{in}$  is blocked. When the control signals S1 and S0 are in the "01" state,  $V_{in}$  is 0.7V ( $V_{dd}$ ), and  $V_{out}$  is 0.595V ( $V_{ddl}$ ), that is, level-down shifting of the input voltage signal is performed. When the control signals S1 and S0 are in the "10" state,  $V_{in}$  is 0.595V ( $V_{ddl}$ ) = 85% of  $V_{dd}$ ), and  $V_{out}$  is  $0.7V(V_{dd})$ , that is, level-up shifting is performed. It is observed that the three functions, level-up shifting, level-down shifting, and blocking, are performed depending on values of S1 and S0. This is verified from Table I. Thus, the ULS can be programmed, for example, by external stimuli through the Radio-Frequency (RF) interface of DDNEMS.

 $\hat{D}$  $D_{optimal}$  $C_{lower}$  $C_{upper}$  $\overline{W}_{PMOSup}$ 64nm640nm64nm $\overline{W_{NMOSup}}$ 64nm $\overline{640nm}$ 640nm $\overline{W_{PMOSdown}}$ 64nm640nm 64nm

640nm

640nm

64nm

 $W_{NMOSdown}$ 

Table III. Design Variable Values  $\hat{D}$  for Optimal Power and Delay



Fig. 11. Functional simulation of the ULS circuit. It verifies the truth table given in Table I, demonstrating its programmability capability of the ULS. The bottommost curve is the input, the 2 topmost curves are the outputs, and the middle 2 signals are control signals. The sequence of operations is block, step-down, and step-up.

## 5.2 ULS Characterization

The ULS characterization for three types of analysis, such as parametric analysis, load analysis, and power analysis, is now presented. It is observed that the ULS circuit is stable under varying operating conditions and hence the design is robust.

5.2.1 *Parametric Analysis*. The parametric analysis involves testing of the level-up shifting and level-down shifting of the ULS circuit. For the level-up shifting,  $V_{in}$  is varied from 0.1V to 0.595V in steps of 0.05V and the output of ULS is observed. As shown in Figure 12, a stable level-up shifting is performed for voltages as low as 0.35V (50% of  $V_{DD}$ ). For the level-down shifting,  $V_{in}$  is varied from 0.1V to 0.7V in steps of 0.05V. The output in Figure 12(b) shows that stable level-down shifting is performed for voltages greater than 0.35V.

5.2.2 *Load Analysis*. Load analysis is used to determine the excess load the ULS can drive. The ULS can be placed in any portion of a target circuit; thus



Fig. 12. Parametric analysis of the ULS showing the output  $(V_{out})$  waveforms. It is evident that the ULS could produce constant output voltage even for varying input voltages.

is it important that ULS operates under varying loading conditions. The value of nominal load capacitance  $(C_L)$  is taken as 10 times the gate capacitance of the PMOS transistors  $(C_{gg})$  in the ULC [Mukherjee et al. 2005]. Thus the following expression is used for calculation of load capacitance for high- $\kappa$  nano-CMOS technology.

$$C_{L} = 10 \times \left(\frac{\epsilon_{gate} \times W_{pmos} \times L_{pmos}}{T_{gate}}\right)$$
 (6)

The nominal value of  $C_L$  is calculated as  $82\,fF$ . For the load analysis, the load capacitance is varied from  $50\,fF$  to  $120\,fF$  in steps of  $10\,fF$ . These values of load capacitance represent realistic loads [Yu et al. 2001]. The experimental results as shown in Figure 13(a) and Figure 13(b) demonstrate that the ULS circuit produces a stable and expected output voltage under varying load conditions.



Fig. 13. Output under varying load conditions ( $C_L = 50\,fF$  to  $120\,fF$ ). ULS provides stable output voltage even though the loading condition changes.

5.2.3 *Power Analysis.* The power analysis of the ULS circuit is performed for three different capacitive loading conditions, such as  $50\,fF$ ,  $82\,fF$ , and  $120\,fF$ . Table IV shows the values obtained from analog simulations. The input rise or fall times and switching frequency are also recorded. It is evident that there is not much difference in the power consumption with varying loads. The power measurement includes the dynamic power and subthreshold leakage in the ULS circuit. The gate leakage is measured to be negligible as expected due to the use of high- $\kappa$  nano-CMOS.

#### 6. RELATED PRIOR RESEARCH ON LEVEL SHIFTER CIRCUIT DESIGN

A comparative perspective of selected related prior research on DC-to-DC voltage-level shifters is presented in Table V. The existing research is diverse

|              | •               |            |                       |
|--------------|-----------------|------------|-----------------------|
| Rise or Fall | Switching       | Capacitive | Power                 |
| Time $(ns)$  | Frequency (MHz) | Load (fF)  | Dissipation $(\mu W)$ |
| 10           | 33.33           | 50         | 4.988                 |
| 10           | 33.33           | 82         | 5                     |
| 10           | 33.33           | 120        | 5.8                   |

Table IV. Power Consumption of the 24-Transistor ULS

Table V. Research on DC-to-DC Voltage-Level Shifters

| Research                      | Tech. | Power         | Delay           | Shifting Type |
|-------------------------------|-------|---------------|-----------------|---------------|
| [Kanno et al. 2000]           | 140nm | _             | 5ns             | Down          |
| [Yu et al. 2001]              | 350nm | $220.57\mu W$ | -               | Up            |
| [Kulkarni and Sylvester 2003] | 130nm | _             | -               | Up            |
| [Ishihara and Sheikh 2004]    | 130nm | _             | 127 ps          | Up/Down       |
| [Yuan and Chen 2005]          | 180nm | _             | -               | Up            |
| [Sadeghi et al. 2006]         | 100nm | $10\mu W$     | 1ns             | Up            |
| [Mohanty et al. 2007b]        | 90nm  | $27.1\mu W$   | -               | Up/Down/Block |
| [Ghai et al. 2008]            | 90nm  | $12.26\mu$ W  | 111.3 <i>ps</i> | Up/Down/Block |
| This Paper                    | 32nm  | $5\mu W$      | 1.6ns           | Up/Down/Block |

in terms of functionality, CMOS technology node, and circuit features. Thus, the existing research is discussed with a broad perspective without direct comparison. A level-down shifter with differential input pair operation is presented in Kanno et al. [2000]. In Yu et al. [2001], a Symmetrical Dual-Cascode Voltage Switch (SDCVS) is proposed which achieves 50% reduction in short-circuit power and 60% speed increase. In Kulkarni and Sylvester [2003], new level converting circuits that consume 8–50% less energy compared to traditional techniques are proposed. In Ishihara and Sheikh [2004], up-shifters and downshifters have been used to minimize energy and delay. A level-up shifter using a Dual-Cascode Voltage Switch (DCVS) is presented in Yuan and Chen [2005]. In Sadeghi et al. [2006], only the issue of short-circuit power dissipation is handled. In Ghai et al. [2008], a universal level converter performing the functionalities of the ULS presented in this article is proposed for 90nm dual-oxide thickness technology.

The average power consumption of the ULS is  $5\mu W$  making it the lowest-power design reported. It is evident that this is the first-ever reported level converter implemented using 32nm high- $\kappa$ /metal-gate nano-CMOS technology. The proposed ULS consumes the least power compared to other level shifters presented. It can also be observed from the table that existing circuits perform a specific task, either up or down shifting, and are not programmable, unlike the ULS which can perform multiple tasks and is programmable.

This archival journal article is based on preliminary idea presented in Mohanty et al. [2009a]. In the current article, system-level energy management aspects as well as energy-efficient design of the ULS are presented. Three representative systems, such as DDNEMS, SDC, and NMP, are discussed, which are needed in critical applications like health care, DRM, and video broadcasting over IP. Formal representation of the design flow of the ULS is presented for high- $\kappa$ /metal-gate nano-CMOS technology. The optimization algorithm for

energy-efficient design of ULS using dual- $V_{th}$  technology is thoroughly discussed. In the previous publication [Mohanty et al. 2007b], the general idea of a universal level shifter was introduced, whereas in Ghai et al. [2008] dual- $T_{ox}$  technology is used for energy-efficient design of the ULS.

## 7. SUMMARY, CONCLUSIONS, AND FUTURE RESEARCH

In this article, a new circuit called ULS is presented for the static as well as dynamic power management in multiple-supply voltage ( $V_{DD}$ )-based AMS-SoC architecture. The ULS is applicable for scenarios where different supply voltages are needed from a single power supply. The ULS is capable of performing three types of distinct level converting operations on the input signal: up-shifting, down-shifting, and blocking. This makes the proposed ULS highly suitable for use in the context of dynamic power management in a multi- $V_{dd}$  AMS-SoC. The ULS can be used for static power management (i.e., low-power design) in multi- $V_{dd}$ -based circuits to connect islands operated at different voltage levels. ULS can also be used to disconnect the power supply when a portion of the circuit is not used.

As a specific realization, an 32nm high- $\kappa$ /metal-gate-based design of the ULS is presented. The ULS circuit is subjected to further power minimization by applying a dual- $V_{th}$  technique. Finally, an algorithm is introduced and applied for the power and delay optimization of the entire ULS circuit. The robustness of the ULS circuit is tested using parametric, load, and power analysis. It is observed that a stable output is obtained for voltages as low as 0.35V and capacitive loads varying from  $50\,fF$  to  $120\,fF$ .

A complementary of this research is an array of batteries (called IntellBatt) which are scheduled using a novel switching mechanism to provide voltage levels needed [Mandal et al. 2008]. Such battery scheduling is also attractive for system-level power management which extends the battery life by 22%. Thus, a combined ULS and IntellBatt can be immensely useful for system-level power management particularly in portable applications.

Based on the ULS idea, future research will include considering gate-induced junction leakage (GIDL) in the optimization process. Physical design for 32nm high- $\kappa$  technology will be performed. As part of future research, it is planned to design the ULS using other nanoscale technologies, such as Double-Gate FET (DGFET), Carbon Nano-Tube FET (CNTFET), etc., and analyze the effects on the performance metrics.

#### **ACKNOWLEDGMENTS**

The authors would like to acknowledge Suparna Vadlamudi and Dhruva Ghai, graduates of the University of North Texas.

#### REFERENCES

Adamo, O. B. 2006. VLSI architecture and FPGA prototyping of a secure digital camera for biometric application. M.S. thesis, University of North Texas.

Adamo, O. B., Mohanty, S. P., Kougianos, E., and Varanasi, M. 2006a. VLSI architecture for encryption and watermarking units towards the making of a secure digital camera. In *Proceedings of the IEEE International SOC Conference (SOCC)*. 141–144.

- Adamo, O. B., Mohanty, S. P., Kougianos, E., Varanasi, M., and Cai, W. 2006b. VLSI architecture and FPGA prototyping of a digital camera for image security and authentication. In *Proceedings of the IEEE Region 5 Technology and Science Conference*. 154–158.
- Alfonsi, B. 2005. I want my IPTV: Internet protocol television predicted a winner. *IEEE Distrib.*Syst. Online.
- Bhargava, B., Shi, C., and Wang, S. 2004. MPEG video encryption algorithms. *Multimedia Tools Appl. 24*, 3, 57–79.
- Blythe, P. and Fridrich, J. 2004. Secure digital camera. In *Proceedings of Digital Forensic Research Workshop (DFRWS)*.
- Chau, R., Kavalieros, J., Roberds, B., Schenker, R., Lionberger, D., Barlage, D., et al. 2000. 30nm physical gate length CMOS transistors with 1.0ps n-MOS and 1.7ps p-MOS gate delays. *IEDM Tech. Digest*, 45–48.
- CHERRY, S. 2005. The battle for broadband [Internet protocol television]. IEEE Spectrum.
- Choi, R. Onishi, K., Kang, C., Gopalan, S., Nieh, R., Kim, Y., et. al. 2002. Fabrication of high quality ultra-thin HfO<sub>2</sub> gate dielectric MOSFETs using deuterium anneal. *IEDM Tech. Digest*, 613–616.
- Cox, I. J. and Miller, M. L. 2002. Electronic watermarking: The first 50 years. *EURASIP J. Appl. Signal Process.* 2, 126–132.
- Emmanuel, S. and Kankanhalli, M. S. 2003. A digital rights management scheme for broadcast video. ACM-Springer Verlag Multimedia Syst. J. 8, 6, 444–458.
- ESKICIOGLU, A. M. AND DELP, E. J. 2001. An overview of multimedia content protection in consumer electronics devices. *Elsevier Signal Processing: Image Comm. 16*, 681–699.
- FRIEDMAN, G. L. 1993. The trustworthy digital camera: Restoring credibility to the photographic image. *IEEE Trans. Consumer Electron.* 39, 4, 905–910.
- Ghai, D., Mohanty, S. P., and Kougianos, E. 2008. A dual oxide CMOS universal voltage converter for power management in multi- $V_{DD}$  SoCs. In *Proceedings of the 9th IEEE International Symposium on Quality Electronic Design*. 257–260.
- Ghai, D., Mohanty, S. P., and Kougianos, E. 2009a. Unified P4 (power-performance-process-parasitic) fast optimization of a nano-CMOS VCO. In *Proceedings of the 19th ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI)*. 303–308.
- Ghai, D., Mohanty, S. P., Kougianos, E., and Patra, P. 2009b. A PVT aware accurate statistical logic library for high-κ metal-gate nano-CMOS. In *Proceedings of 10th International Symposium on Quality of Electronic Design (ISQED)*. 47–54.
- HAGER, W. W. AND ZHANG, H. 2006. Algorithm 851: CG-DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32, 1, 113–137.
- Hu, Z., Buyuktosunoglu, A., and Srinivasan, V. 2004. Microarchitectural techniques for power gating of execution units. In *Proceedings of the International Symposium Low Power Electronics and Design*.
- Ishihara, F. and Sheikh, F. 2004. Level conversion for dual supply systems. *IEEE Trans. VLSI Syst. 12*, 2, 185–195.
- JAIN, R. 2005. I want my IPTV. IEEE Multimedia.
- Kanno, Y., Mizuno, H., Tanaka, K., and Watanabe, T. 2000. Level converters with high immunity to power-supply bouncing for high-speed sub-1-V LSIs. In *Proceedings of the Symposium on VLSI Circuits Digest of Technical Papers*. 202–203.
- KOUGIANOS, E. AND MOHANTY, S. P. 2009. Impact of gate-oxide tunneling on mixed-signal design and simulation of a nano-CMOS VCO. Elsevier Microelectronics J. 40, 1, 95–103.
- KOUGIANOS, E., MOHANTY, S. P., AND MAHAPATRA, R. N. 2009. Hardware assisted watermarking for multimedia. Elsevier Int. J. Comput. Electr. Engin. 35, 2. SI Circuits and Systems for Real-Time Security and Copyright Protection of Multimedia. 339–358.
- Kounavis, M. E., Kumar, A., Vin, H., Yavatkar, R., and Campbell, A. T. 2003. Directions in packet classification for network processors. In *Proceedings of the 2nd Workshop on Network Processors*.
- Kulkarni, S. H. and Sylvester, D. 2003. Fast and energy-efficient asynchronous level converters for multi-VDD design. In *Proceedings of the IEEE International Systems-on-Chip Conference*. 169–172.
- Macq, B. M. and Quisquater, J. J. 1995. Cryptography for digital TV broadcasting. *Proc. IEEE* 83, 6, 944–957.
- ACM Journal on Emerging Technologies in Computing Systems, Vol. 6, No. 2, Article 8, Publication date: June 2010.

- Mandal, S. K., Bhojwani, P., Mohanty, S. P., and Mahapatra, R. N. 2008. IntellBatt: Towards smarter battery design. In *Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC)*. 872–877.
- Mathai, N. J., Kundur, D., and Sheikholeslami, A. 2003a. Hardware implementation perspectives of digital video watermarking algorithms. *IEEE Trans. Signal Process.* 51, 4, 925–938.
- Mathai, N. J., Sheikholeslami, A., and Kundur, D. 2003b. VLSI implementation of a real-time video watermark embedder and detector. In *Proceedings of the IEEE International Symposisum on Circuits and Systems*. 772–775.
- Memon, N. and Wong, P. W. 1998. Protecting digital media content. Comm. ACM 41, 7, 35-43.
- Mohanty, S. P., Adamo, O. B., and Kougianos, E. 2007a. VLSI architecture of an invisible water-marking unit for a biometric-based security system in a digital camera. In *Proceedings of the 25th IEEE International Conference on Consumer Electronics (ICCE)*. 485–486.
- Mohanty, S. P., Ghai, D., and Kougianos, E. 2010. A P4VT (power-performance-process-parasitic-voltage-temperature) aware dual- $V_{Th}$  nano-CMOS VCO. In *Proceedings of the 23rd IEEE International Conference on VLSI Design (ICVD)*.
- Mohanty, S. P., Ghai, D., Kougianos, E., and Joshi, B. 2009a. A universal level converter towards the realization of energy efficient implantable drug delivery nano-electro-mechanical-systems. In *Proceedings of the International Symposium on Quality Electronic Design*. 673–679
- Mohanty, S. P., Ghai, D., Kougianos, E., and Patra, P. 2009b. A combined packet classifier and scheduler towards net-centric multimedia processor design. In *Proceedings of the 25th IEEE International Conference on Consumer Electronics (ICCE)*. 11–12.
- Mohanty, S. P. and Kougianos, E. 2007. Simultaneous power fluctuation and average power minimization during nano-CMOS behavioural synthesis. In *Proceedings of the 20th IEEE International Conference on VLSI Design.* 577–582.
- Mohanty, S. P., Ranganathan, N., and Balakrishnan, K. 2006. A dual voltage-frequency VLSI chip for image watermarking in DCT domain. *IEEE Trans. Circ. Syst. II* 53, 5, 394–398
- Mohanty, S. P., Ranganathan, N., Kougianos, E., and Patra, P. 2008. Low-Power High-Level Synthesis for Nanoscale CMOS Circuits. Springer.
- Mohanty, S. P., Ranganathan, N., and Namballa, R. 2005. A VLSI architecture for visible water-marking in a secure still digital camera (S<sup>2</sup>DC) design. *IEEE Trans. VLSI Syst. 13*, 8, 1002–1012.
- Mohanty, S. P., Ranganathan, N., and Namballa, R. K. 2003. VLSI implementation of invisible digital watermarking algorithms towards the development of a secure JPEG encoder. In *Proceedings of the IEEE Workshop on Signal Processing Systems*. 183–188.
- Mohanty, S. P., Ranganathan, N., and Namballa, R. K. 2004. VLSI implementation of visible watermarking for a secure still camera design. In *Proceedings of International Conference of VLSI Design*. 1063–1068.
- Mohanty, S. P., Vadlamudi, S. T., and Kougianos, E. 2007b. A universal voltage level converter for multi-Vdd based low-power nano-CMOS systems-on-chips(SoCs). In *Proceedings of the 13th NASA Symposium on VLSI Design*. 2.2.
- Mukherjee, V., Mohanty, S. P., and Kougianos, E. 2005. A dual dielectric approach for performance aware gate tunneling reduction in combinational circuits. In *Proceedings of the 23rd IEEE International Conference of Computer Design (ICCD)*. 431–436.
- Nelson, G. R., Jullien, G. A., and Pecht, O. Y. 2005. CMOS image sensor with watermarking capabilities. In *Proceedings of the IEEE International Symposium on Circuits and Systems* (ISCAS). 5326–5329.
- Nourani, M. and Faezipour, M. 2006. A single-cycle multi-match packet classification engine using TCAMs. In *Proceedings of the IEEE Symposium on High Performance Interconnects*. 73–78.
- Rabaey, J. M., Chandrakasan, A., and Nikolic', B. 2003. Digital Integrated Circuits, 2nd Ed., Prentice-Hall Publishers.
- RICHARDSON, I. E. G. 2003. H.264 and MPEG-4 Video Compression. Wiley & Sons.
- Sadeghi, K., Emadi, M., and Farbiz, F. 2006. Using level restoring method for dual supply voltage. In *Proceedings of the 19th International Conference on VLSI Design*. 601–605.
- ACM Journal on Emerging Technologies in Computing Systems, Vol. 6, No. 2, Article 8, Publication date: June 2010.

- SANCHEZ, H., SIEGEL, J., NICOLETTA, C., NISSEN, J. P., AND ALVAREZ, J. 1999. A versatile 3.3/2.5/1.8-V CMOS I/O driver built in a 0.2-um, 3.5-nm Tox, 1.8-V CMOS technology. *IEEE J. Solid State Circ.* 34, 11, 1501–1511.
- Sikora, T. 1997. The MPEG-4 video standard verification model. *IEEE Trans. Circ. Syst. Video Technol.* 7, 1, 19–31.
- SILL, F., You, J., AND TIMMERMAN, D. 2007. Design of mixed gates for leakage reduction. In Proceedings of the 17th Great Lakes Symposium on VLSI. 263–268.
- STAPLES, M., DANIEL, K., CIMA, M., AND LANGER, R. 2006. Application of micro- and nanoelectromechanical devices to drug delivery. *Pharma*. Res. 23, 5, 847–863.
- Tarigopula, S. 2008. A CAM based high-performance classifier scheduler for a video network processor. M.S. thesis, University of North Texas.
- Vadlamudi, S. T. 2007. A nano-CMOS based universal voltage level converter for multi-VDD SoCs. M.S. thesis, Department of Computer Science and Engineering, University of North Texas.
- Wei, L., Chen, Z., Roy, K., Johnson, M. C., Yibin, Y., and De, V. K. 1999. Design and opimization of dual-threshold circuits for low-voltage low-power applications. *IEEE Trans. VLSI Syst.* 7, 1, 16–24.
- Wolbring, G. Nanoscale drug delivery systems. http://www.innovationwatch.com/choiceisyours/choiceisyours-2007-12-15.htm.
- Xu, J. and Lipton, R. J. 2002. On fundamental tradeoffs between delay bounds and computational complexity in packet scheduling algorithms. In *Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications.* 15–28.
- Yu, C. C., Wang, W. P., and Liu, B. D. 2001. A new level converter for low power applications. In *Proceedings of the IEEE International Symposium on Circuits and Systems*. 113–116.
- Yuan, C. P. and Chen, Y. C. 2005. A voltage level converter circuit design with low-power consumption. In Proceedings of the 6th International Conference on ASIC. 309–310.
- ZHANG, L. L. BEACHAM, B., HASHEMI, M. R., CHOW, P., AND LEON-GARCIA, A. 2000. A scheduler ASIC for a programmable packet switch. *IEEE Micro 20*, 1, 4248.
- Zhao, W. and Cao, Y. 2006. New generation of predictive technology model for sub-45nm design exploration. In *Proceedings of the International Symposium on Quality Electronic Design*. 585–500

Received October 2009; revised January 2010; accepted February 2010